AITopics | linear combination

Collaborating Authors

linear combination

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TADA: Improved Diffusion Sampling with Training-free Augmented DynAmics

Neural Information Processing SystemsJun-21-2026, 13:26:33 GMT

Diffusion models have demonstrated exceptional capabilities in generating highfidelity images but typically suffer from inefficient sampling. Many solver designs and noise scheduling strategies have been proposed to dramatically improve sampling speeds. In this paper, we introduce a new sampling method that is up to 186% faster than the current state of the art solver for comparative FID on ImageNet512. This new sampling method is training-free and uses an ordinary differential equation (ODE) solver. The key to our method resides in using higher-dimensional initial noise, allowing to produce more detailed samples with less function evaluations from existing pretrained diffusion models. In addition, by design our solver allows to control the level of detail through a simple hyper-parameter at no extra computational cost.

artificial intelligence, diffusion model, machine learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

In Context Compositional Learning via Sparse Coding Transformer

Neural Information Processing SystemsJun-16-2026, 17:36:20 GMT

Transformer architectures have achieved remarkable success across language, vision, and multimodal tasks, and there is growing demand for them to address in-context compositional learning tasks. In these tasks, models solve the target problems by inferring compositional rules from context examples, which are composed of basic components structured by underlying rules. However, some of these tasks remain challenging for Transformers, which are not inherently designed to handle compositional tasks and offer limited structural inductive bias. In this work, inspired by the principle of sparse coding, we propose a reformulation of the attention to enhance its capability for compositional tasks. In sparse coding, data are represented as sparse combinations of dictionary atoms with coefficients that capture their compositional rules.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)

Add feedback

More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations

Wang, Mingze, Wang, Jinbo, Xia, Yikuan, Shen, Kai, Zhong, Shu

arXiv.org Machine LearningMay-27-2026

Feedforward network (FFN) layers account for a large fraction of parameters and nonlinear expressivity in Transformer-based large language models (LLMs). Despite the evolution from ReLU and GELU to gated variants such as SwiGLU, most FFN designs still use a single fixed activation function, applying the same nonlinear transformation to all tokens. In this work, we propose Mixture of Activations (MoA), a token-adaptive FFN design that mixes a dictionary of activation functions using lightweight input-dependent gates while sharing the same linear projections. As an input-independent counterpart, we also introduce learnable activations (LA), which form linear combinations of activation functions for both ReLU-type and SwiGLU-type FFNs. Theoretically, we establish strict finite-width expressive separations among fixed-activation FFNs, LA, and MoA: LA strictly contains fixed-activation FFNs, while MoA strictly contains LA, with the additional expressivity arising from input-dependent nonlinear hybridization. Empirically, we evaluate MoA through extensive pre-training experiments on dense and MoE language models ranging from 0.12B to 2B parameters under different token budgets, optimizers, and learning rate schedules. MoA consistently achieves lower terminal loss and exhibits more favorable scaling behavior than well-tuned baselines, with minimal parameter and computational overhead. These results suggest that token-adaptive activation mixing is a simple and effective mechanism for improving FFN expressivity in LLMs.

large language model, machine learning, tanh, (19 more...)

arXiv.org Machine Learning

2605.26647

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Symbolic Regression via Neural Networks

Boddupalli, Nibodh, Matchen, Timothy, Moehlis, Jeff

arXiv.org Machine LearningMay-7-2026

Machine learning - specifically deep learning - techniques have shown their capabilities in approximating dynamics from data, but a shortcoming of traditional deep learning is that there is little insight into the underlying mapping beyond its numerical output for a given input. This limits their utility in analysis beyond simple prediction. Simultaneously, a number of strategies exist which identify models based on a fixed dictionary of basis functions, but most either require some intuition or insight about the system, or are susceptible to overfitting or a lack of parsimony. Here we present a novel approach that combines the flexibility and accuracy of deep learning approaches with the utility of symbolic solutions: a deep neural network that generates a symbolic expression for the governing equations. We first describe the architecture for our model, then show the accuracy of our algorithm across a range of classical dynamical systems. The dynamics of quantities of interest are widely modeled A number of authors have approached system identificaas differential equations, often derived from first princi-tion by fitting coefficients of a linear combination of basis 3ples. However, this is not always possible, especially whenfunctions, dating at least back to Crutchfield and McNamara . The The set of basis functions typically includes nonlinear terms, identification of models from data has seen significant ad-for example terms which would arise in a Taylor series exvances with the advent of machine learning. While deeppansion about the origin of the system3-6 or a broader class neural networks have enabled sufficient accuracy in fore-of functions7. The coefficients of the basis functions are decasting dynamic data with unprecedented versatility, thetermined through comparison of the original data points with models they represent lack closed-form expressions thatpoints from computed solutions to the fitted models. Varican be conducive to interpretation and analysis.

artificial intelligence, machine learning, trajectory, (18 more...)

arXiv.org Machine Learning

doi: 10.1063/5.0134464

2605.04337

Country: North America > United States > California (0.28)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

1b9812b99fe2672af746cefda86be5f9-Supplemental.pdf

Neural Information Processing SystemsMay-1-2026, 01:51:30 GMT

artificial intelligence, machine learning, nullnull, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

Towards Lower Bounds on the Depth of ReLU Neural Networks

Neural Information Processing SystemsMay-1-2026, 01:51:26 GMT

We contribute to a better understanding of the class of functions that is represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning tasks. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). This problem has potential impact on algorithmic and statistical aspects because of the insight it provides into the class of functions represented by neural hypothesis classes. However, to the best of our knowledge, this question has not been investigated in the neural network literature. We also present upper bounds on the sizes of neural networks required to represent functions in these neural hypothesis classes.

artificial intelligence, machine learning, neural network, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States (0.28)
Asia > India (0.28)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

4ffbd5c8221d7c147f8363ccdc9a2a37-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 21:09:19 GMT

artificial intelligence, machine learning, psnr, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Appendix 1 Interpretation using rank-1 Nyström approximation

Neural Information Processing SystemsApr-24-2026, 14:55:17 GMT

The bound in Equation 5 of the main paper can be interpreted using a rank-1 Nyström approximation for f(xt,xt). By holding w fixed and maximizing for q in the right hand side of Equation 5, we get q = f(w,w) P t ytf(xt,w) where f(w,w) indicates the pseudo-inverse.1 Typically the weight vector w, often called a "landmark", used in the Nyström approximation is set either by setting it to a random input or by more sophisticated schemes like setting it with KMeans. In our case, we are directly optimizing the landmarks via Equation 6 in the main paper. To our knowledge the only other work to do this was performed in Fu [2014]. The code used in the main training loop of our algorithm is shown in Figure 1.

approximation, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > New York > New York County > New York City (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback